measure value
The Eigenvalues Entropy as a Classifier Evaluation Measure
Classification is a machine learning method used in many practical applications: text mining, handwritten character recognition, face recognition, pattern classification, scene labeling, computer vision, natural langage processing. A classifier prediction results and training set information are often used to get a contingency table which is used to quantify the method quality through an evaluation measure. Such measure, typically a numerical value, allows to choose a suitable method among several. Many evaluation measures available in the literature are less accurate for a dataset with imbalanced classes. In this paper, the eigenvalues entropy is used as an evaluation measure for a binary or a multi-class problem. For a binary problem, relations are given between the eigenvalues and some commonly used measures, the sensitivity, the specificity, the area under the operating receiver characteristic curve and the Gini index. A by-product result of this paper is an estimate of the confusion matrix to deal with the curse of the imbalanced classes. Various data examples are used to show the better performance of the proposed evaluation measure over the gold standard measures available in the literature.
Properties of fairness measures in the context of varying class imbalance and protected group ratios
Brzezinski, Dariusz, Stachowiak, Julia, Stefanowski, Jerzy, Szczech, Izabela, Susmaga, Robert, Aksenyuk, Sofya, Ivashka, Uladzimir, Yasinskyi, Oleksandr
Society is increasingly relying on predictive models in fields like criminal justice, credit risk management, or hiring. To prevent such automated systems from discriminating against people belonging to certain groups, fairness measures have become a crucial component in socially relevant applications of machine learning. However, existing fairness measures have been designed to assess the bias between predictions for protected groups without considering the imbalance in the classes of the target variable. Current research on the potential effect of class imbalance on fairness focuses on practical applications rather than dataset-independent measure properties. In this paper, we study the general properties of fairness measures for changing class and protected group proportions. For this purpose, we analyze the probability mass functions of six of the most popular group fairness measures. We also measure how the probability of achieving perfect fairness changes for varying class imbalance ratios. Moreover, we relate the dataset-independent properties of fairness measures described in this paper to classifier fairness in real-life tasks. Our results show that measures such as Equal Opportunity and Positive Predictive Parity are more sensitive to changes in class imbalance than Accuracy Equality. These findings can help guide researchers and practitioners in choosing the most appropriate fairness measures for their classification problems.
Generic and Robust Root Cause Localization for Multi-Dimensional Data in Online Service Systems
Li, Zeyan, Chen, Junjie, Chen, Yihao, Luo, Chengyang, Zhao, Yiwei, Sun, Yongqian, Sui, Kaixin, Wang, Xiping, Liu, Dapeng, Jin, Xing, Wang, Qi, Pei, Dan
Localizing root causes for multi-dimensional data is critical to ensure online service systems' reliability. When a fault occurs, only the measure values within specific attribute combinations are abnormal. Such attribute combinations are substantial clues to the underlying root causes and thus are called root causes of multidimensional data. This paper proposes a generic and robust root cause localization approach for multi-dimensional data, PSqueeze. We propose a generic property of root cause for multi-dimensional data, generalized ripple effect (GRE). Based on it, we propose a novel probabilistic cluster method and a robust heuristic search method. Moreover, we identify the importance of determining external root causes and propose an effective method for the first time in literature. Our experiments on two real-world datasets with 5400 faults show that the F1-score of PSqueeze outperforms baselines by 32.89%, while the localization time is around 10 seconds across all cases. The F1-score in determining external root causes of PSqueeze achieves 0.90. Furthermore, case studies in several production systems demonstrate that PSqueeze is helpful to fault diagnosis in the real world.
Tim Kane: How do you measure value? And other great questions for Labor Day
Fox News Flash top headlines for September 1 are here. Check out what's clicking on Foxnews.com As Americans celebrate Labor Day 2019, robots are stealing their jobs, as are immigrants, as are cheap imports from China. The first puzzle is: if all of these nefarious forces of free markets are stealing jobs, how is it that there are more Americans employed than ever before? Today, there are over 151 million workers on U.S. payrolls.